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ABSTRACT 

Emergence of robustness in biological networks is a para- 
mount feature of evolving organisms, but a study of this 
property in vivo, for any level of representation such as Ge- 
netic, Metabolic, or Neuronal Networks, is a very hard chal- 
lenge. In the case of Genetic Networks, mathematical mod- 
els have been used in this context to provide insights on their 
robustness, but even in relatively simple formulations, such 
as Boolean Networks (BN), it might not be feasible to com- 
pute some measures for large system sizes. We describe in 
this work a Monte Carlo approach to calculate the size of the 
largest basin of attraction of a BN, which is intrinsically as- 
sociated with its robustness, that can be used regardless the 
network size. We show the stability of our method through 
finite-size analysis and validate it with a full search on small 
networks. 

Categories and Subject Descriptors 

G.3 [Mathematics of Computing]: Probability and Statis- 
tics; 1.1.2 [Computing Methodologies]: Algorithms 

General Terms 

Algorithms 
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1. INTRODUCTION 

Robustness in biological organisms is one of the major 
characteristics which contributes to their survival in the en- 
vironment, maintaining its functions in face of external and 
internal perturbations [10[ Despite its importance, the 
complete understanding of the mechanisms behind this un- 
derlying property is still not possible, as tn vivo studies are 
a very hard challenge due to knowledge limitations. In this 
context, mathematical abstractions of the interactions that 
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constitute an organism are powerful tools to provide expla- 
nations about biological robustness, especially regarding the 
robustness present in some biological networks [13 22 



In general, it is expected that phenotype robustness is a 
consequence of a specific set of rules or patterns in the in- 
terrelationship among the genes of an organism |23j . Hence, 
theoretical models of Gene Regulatory Networks might be 
able to provide insights on their robustness. As a rela- 
tively simple and rich model, the Boolean Network (BN) 
model, which consider genes as on-off switches, is often suc- 
cessfully applied when details about gene-gene interactions 
are absent [l5][ll[l7||6l[Zl.lll[9l[lll[i6]. Among several 
key features, attractors (which captures the gene expression 
patterns that are periodically visited) and the size of their 
basins (which are made up of all the expression patterns 
that conduct to this attractors) are measures that can re- 
veal important characteristics of the underlying biological 
network [12[ |18| . More information regarding prospects and 
limitations of this paradigm can be found in the review writ- 
ten by Bornholdt [s]. 

The simplification of the Boolean formalism does not solve 
completely the problem of building a mathematical defini- 
tion of robustness [11| . A lot of measures have been proposed 
following different interpretations of the concept, such as 
Derrida curves [5], identification of Intrinsically Multivari- 
ate Prediction Genes [2l], and the size of the largest basin 
of attraction. In this work, we consider robustness as the 
ability of executing the same activity despite random fluc- 
tuations in a limited number of genes. Hence, a network 
with a large basin of attraction might be considered robust, 
and the the size of the largest basin of attraction (A) a good 
estimation for its robustness. 

The ability to measure the largest basin of attraction of 
a BN in a fast and reliable way will cause a huge impact in 
the robustness characterization of the most of the available 
Genetic Networks, which can easily be composed of more 
than 6000 genes. However, measuring A imposes its own 
challenge, as an exhaustive search has an exponential com- 
plexity in the number of genes, making it impractical for 
large network sizes. In Brun et al. 4 an estimation of A 
is calculated from the steady state distribution of a Proba- 
bilistic Boolean Network, but also with an exponential com- 
plexity in the number of genes (nodes). This work proposes 
a Monte Carlo approach to calculate A that can be used 
regardless the network size. It is showed that the proposed 
methodology is reliable through finite-size analysis. The ob- 
tained results were validated by using a full search on small 
networks. 



2. BOOLEAN NETWORK 

A Boolean Network (BN) is defined by a set of n Boolean 
variables X = {x\,X2, ■ ■ ■ , Xn} and a set of n Boolean func- 
tions F = {/i, /2, • • • , /n}- In our context, each variable 
Xi represents a gene, and it can assume only two possible 
values: (OFF) or 1 (ON). The value of gene Xi at time 
i + 1 is obtained from the values of a set of predictor genes 
(i) ' ■''J2(») 1 • • • i^jfc (»)} ^ ^ at time t through the 
Boolean function fi : {0, 1}'"' — >■ {0, 1} that belongs to F. 

The number of predictors of Xi £ X is called the in- degree 
of Xi, and the out-degree of Xi is defined as the number of 
genes in X for which Xi is one of their predictors. We con- 
sider in this work that all genes are updated synchronously 
by the functions in F assigned to them. 

A state of a BN at time t is a binary vector s{t) — 
{xi{t), . . . ,Xri{t)), representing the value of all Boolean vari- 
ables in the network, roughly describing the genetic activity 
of the organism (all genes) at a certain time. In this way, 
the number of all possible states is 2". Keeping the Boolean 
function that regulates each gene fixed, and without consid- 
ering any random fiuctuation in the value of the genes, a BN 
is a deterministic formulation. 

The dynamics of a BN can be represented by a directed 
network, called state transition diagram, in which its nodes 
correspond to the states of the BN and an arc from one 
node to another corresponds to a state transition between 
them. A set of states that is periodically visited is called 
an attractor of the BN, and all states that eventually lead 
to this attractor (and including all states in the attractor) 
are called the basin of attraction. In this work, consider 
the number of states in the basin of attraction as its size. 
Finally, define A as the size of the largest basin of attraction 
divided by the total number of states. 



3. METHOD DESCRIPTION 

In our method, we assume that the largest basin of at- 
traction is also the easiest to identify, in the sense that a 
randomly chosen state is more likely to be part of it. Hence, 
the probability that a randomly chosen state is the largest 
basin of attraction is equal to the fraction A of states in this 
basin. 

To estimate A, a Monte Carlo approach is proposed for 
large networks since a full search is impracticable. We draw 
a number Z of states and for each of them the attractor 
is identified. The most frequent attractor has Z* random 
states that conduct to it, hence we can assume that the 
ratio Z* /Z is a good estimator for A, i.e., the size of the 
largest basin divided by the total number of states. 

The number of states in an attractor, as well as the average 
number of time steps that a state in the basin of attraction 
takes to reach its attractor, were numerically estimated as 
the function y{n) = O.OOSn^'^^^''. This in agreement with re- 
sults provided by Kauffman [9], who considered the attrac- 
tor size as a polynomial function of the number of nodes. 
Hence, by considering a randomly chosen network state, we 
can assume that the number of time steps that it will take to 
reach all the states in its attractor (i.e., to reach one state in 
the attractor and, from that state, visit all states in the at- 
tractor), is, for most of the cases, 2y{n). This upper bound 
implies that the complexity of our algorithm is 2y{n)Z in 
the average case. 



4. RESULTS AND DISCUSSION 

The proposed methodology was executed considering dif- 
ferent network sizes: n from 10 to 26. Each gene regula- 
tory network is generated as a random graph (Erdos-Renyi 
(ER) directed graph, with in-degree fixed), characterized by 
a Poisson distribution of out-degrees. For comparison, we 
perform a full search of the largest basin of attraction on 
the smaller gene network sizes, n < 20. For each gene reg- 
ulatory network size, the value of A is the average of 100 
randomly generated networks. 
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Figure 1: Effect of the number of initial random 
states Z on the estimated largest basin of attrac- 
tion A. Purple squares represent our Monte Carlo 
approach, with the light purple line as a Linear Fit- 
ting of the data. In the limit of infinite Z states, the 
extrapolation of our method is very similar to the 
full search estimation (red line). The value of A is 
an average over 100 BNs of size n = 20. 
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Figure 2: Effect of the size n of the BN on the es- 
timated largest basin of attraction A. Purple/red 
squares represent our Monte Carlo approach/FYill 
search, with the light purple/orange line as a Linear 
Fitting of the data. In the limit of infinite size, the 
extrapolation of both methods are very similar. The 
value of A is an average over 100 BNs. The Monte 
Carlo method was executed with Z = 1000. 



In our proposal, the number of initial random states Z is 
a very important parameter to be determined, s&ii Z <^ 2" 
our method is much less costly than the exponential com- 
plexity of the full search. By computing the value of A 
against 1 jZ in Figure[lJ we show that; a value for Z, as small 
as 1000, already provides a good estimation for a network 
of 2^" states, and that the full search estimation, A = 0.76, 
is recovered for a sufficient large value of Z. 

For a fixed value of Z at 1000, we measure A for different 
network sizes in Figure |2] which also presents the value of 
A calculated through a full search for small network sizes. 
Our proposal estimates A within the interval [0.73,0.81], in 
accordance with the full search, despite the fact that Z = 
1000 is much smaller than 2", for n > 15. Both methods also 
point to a similar value of A = 0.77 in networks of infinite 
size. 

5. CONCLUSION 

We describe in this work a fast and reliable strategy to 
measure the largest basin of attraction of a BN. The pro- 
posed method has a polynomial complexity on the number 
of states and the obtained results is in full agreement with 
full search results for small networks, in the limit of an infi- 
nite number of genes. Besides that, our work is a proof-of- 
concept that Monte Carlo estimation might be successfully 
applied in measures related to Boolean Networks. 

We hope with this method to provide a powerful tool to fu- 
ture studies about robustness in biological organisms. From 
this, it is possible to compare, for instance, the robustness of 
the Gene Regulatory Networks of the yeast with a random 
network of the same size, roughly with 6000 genes, possibly 
identifying features that enhance the size of the largest basin 
of attraction. 

There are rooms for improvement in our method as well. 
As shown by Linch 19 20], the average size of the attrac- 
tors y(ri) is superpolinomial in the number of nodes. Hence, 
for large values of n, the complexity of our method could 
be better estimated if the function i/(n) were precisely de- 
scribed. Besides that, the choice of Z could be improved, 
as a better estimation could be made considering Z as an 
increasing function of n. 

It is known that some biological networks have different 
topologies, such as scale-free [24| |2] and small- world |25| . 
Another possible direction would be to apply the proposed 
methodology in larger networks with different topologies. 
Also, topological characteristics of the state transition dia- 
gram could be used to improve the efficiency of our method, 
for instance considering more tractable BNs where the input 
of each Boolean function is equal to one. 
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